Optimization of Speech Enhancement Front-End with Speech Recognition-Level Criterion

نویسندگان

Takuya Higuchi

Takuya Yoshioka

Tomohiro Nakatani

چکیده

This paper concerns the use of speech enhancement to improve automatic speech recognition (ASR) performance in noisy environments. Speech enhancement systems are usually designed separately from a back-end recognizer by optimizing the frontend parameters with signal-level criteria. Such a disjoint processing approach is not always useful for ASR. Indeed, timefrequency masking, which is widely used in the speech enhancement community, sometimes degrades the ASR performance because of the artifacts created by masking. This paper proposes a speech recognition-oriented front-end approach that optimizes the front-end parameters with an ASR-level criterion, where we use a complex Gaussian mixture model (CGMM) for mask estimation. First, the process of CGMM-based timefrequency masking is reformulated as a computation network. By connecting this CGMM network to the input layer of the acoustic model, the CGMM parameters can be optimized for each test utterance by back propagation using an unsupervised acoustic model adaptation scheme. Experimental results show that the proposed method achieves a relative improvement of 7.7 % on the CHiME-3 evaluation set in terms of word error rate.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enhancement and optimisation of a speech recognition front end based on hidden Markov models

A method for performance evaluation of the acousticphonetic front end of a continuous speech recognition system, using the entropy of its output, is described. Results are given for a front end based on phonemic hidden Markov models, with various optional enhancements which have been optimised using the entropy criterion.

متن کامل

End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks

Speech enhancement model is used to map a noisy speech to a clean speech. In the training stage, an objective function is often adopted to optimize the model parameters. However, in most studies, there is an inconsistency between the model optimization criterion and the evaluation criterion on the enhanced speech. For example, in measuring speech intelligibility, most of the evaluation metric i...

متن کامل

Semi-Supervised Joint Enhancement of Spectral and Cepstral Sequences of Noisy Speech

While spectral domain speech enhancement algorithms using non-negative matrix factorization (NMF) are powerful in terms of signal recovery accuracy (e.g., signal-to-noise ratio), they do not necessarily lead to an improvement in the quality of the enhanced speech in the feature domain. This implies that naively using these algorithms as front-end processing for e.g., speech recognition and spee...

متن کامل

Robust Speech Recognition in Reverberant Environment by Optimizing Multi-band Spectral Subtraction

Reverberant environment poses a problem in speech recognition application where performance degrades drastically depending on the extent of reverberation. Thus, it is important to employ front-end speech processing, such as dereverberation to minimize its effect. Most dereverberation techniques used to address this problem enhance the reverberant waveform prior to speech recognition. Although t...

متن کامل

On the Computation of Complex-valued Gradients with Application to Statistically Optimum Beamforming

This report describes the computation of gradients by algorithmic differentiation for statistically optimum beamforming operations. Especially the derivation of complex-valued functions is a key component of this approach. Therefore the real-valued algorithmic differentiation is extended via the complex-valued chain rule. In addition to the basic mathematic operations the derivative of the eige...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Optimization of Speech Enhancement Front-End with Speech Recognition-Level Criterion

نویسندگان

چکیده

منابع مشابه

Enhancement and optimisation of a speech recognition front end based on hidden Markov models

End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks

Semi-Supervised Joint Enhancement of Spectral and Cepstral Sequences of Noisy Speech

Robust Speech Recognition in Reverberant Environment by Optimizing Multi-band Spectral Subtraction

On the Computation of Complex-valued Gradients with Application to Statistically Optimum Beamforming

عنوان ژورنال:

اشتراک گذاری